Extraction Programs: A Unified Approach to Translation Rule Extraction

نویسندگان

  • Mark Hopkins
  • Greg Langmead
  • Tai Vo
چکیده

We provide a general algorithmic schema for translation rule extraction and show that several popular extraction methods (including phrase pair extraction, hierarchical phrase pair extraction, and GHKM extraction) can be viewed as specific instances of this schema. This work is primarily intended as a survey of the dominant extraction paradigms, in which we make explicit the close relationship between these approaches, and establish a language for future hybridizations. This facilitates a generic and extensible implementation of alignment-based extraction methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reducing SMT Rule Table with Monolingual Key Phrase

This paper presents an effective approach to discard most entries of the rule table for statistical machine translation. The rule table is filtered by monolingual key phrases, which are extracted from source text using a technique based on term extraction. Experiments show that 78% of the rule table is reduced without worsening translation performance. In most cases, our approach results in mea...

متن کامل

Compact Rule Extraction for Hierarchical Phrase-based Translation

This paper introduces two novel approaches for extracting compact grammars for hierarchical phrase-based translation. The first is a combinatorial optimization approach and the second is a Bayesian model over Hiero grammars using Variational Bayes for inference. In contrast to the conventional Hiero (Chiang, 2007) rule extraction algorithm , our methods extract compact models reducing model siz...

متن کامل

Learning Better Rule Extraction with Translation Span Alignment

This paper presents an unsupervised approach to learning translation span alignments from parallel data that improves syntactic rule extraction by deleting spurious word alignment links and adding new valuable links based on bilingual translation span correspondences. Experiments on Chinese-English translation demonstrate improvements over standard methods for tree-to-string and tree-to-tree tr...

متن کامل

Hybrid Domain Adaptation for a Rule Based MT System

This study presents several experiments to show the power of domain-specific adaptation by means of hybrid terminology extraction mechanisms and the subsequent terminology integration into a rule based machine translation (RBMT) system, thus avoiding cumbersome human lexicon and grammar customization. Detailed evaluation reveals the great potential of this approach: Translation quality can be i...

متن کامل

Semantic Roles for String to Tree Machine Translation

We experiment with adding semantic role information to a string-to-tree machine translation system based on the rule extraction procedure of Galley et al. (2004). We compare methods based on augmenting the set of nonterminals by adding semantic role labels, and altering the rule extraction process to produce a separate set of rules for each predicate that encompass its entire predicate-argument...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011